A Fault Tolerant Scheduling System Based on Checkpointing for Computational Grids
نویسنده
چکیده
Job checkpointing is one of the most common utilized techniques for providing fault tolerance in computational grids. The efficiency of checkpointing depends on the choice of the checkpoint interval. Inappropriate checkpointing interval can delay job execution. In this paper, a fault-tolerant job scheduling system based on checkpointing technique is presented and evaluated. When scheduling a job, the system uses both average failure time and failure rate of grid resources combined with resources response time to generate scheduling decisions. The system uses the failure rate of the assigned resources to calculate the checkpoint interval for each job. Extensive simulation experiments are conducted to quantify the performance of the proposed system. Experiments have shown that the proposed system can considerably improve throughput, turnaround time and failure tendency.
منابع مشابه
Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملProviding Fault-Tolerance in Unreliable Grid Systems Through Adaptive Checkpointing and Replication
As grids typically consist of autonomously managed subsystems with strongly varying resources, fault-tolerance forms an important aspect of the scheduling process of applications. Two well-known techniques for providing fault-tolerance in grids are periodic task checkpointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce ...
متن کاملCheckpointing Based Fault Tolerant Job Scheduling System for Computational Grid
A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone to different kinds of faults which may lead to delay in completion of job or even execution of job from starting point. Checkpointing mechanism plays a vital role for making grid more reliable, cost effective and efficient. In this paper, we have proposed schemes based on system checkpointing and ...
متن کاملAnalysis of checkpointing for schedulability of real-time systems
Checkpointing is a relatively cost effective method for achieving fault tolerance in real-time systems. Since checkpointing schemes depend on time redundancy, they could affect the correctness of the system by causing deadlines to be missed. This paper provides exact schedulability tests for fault tolerant task sets under specified failure hypothesis and employing checkpointing to assist in fau...
متن کاملFault-Tolerant Multiuser Computational Grids Based on Tuple Spaces
This paper proposes GridTS, a grid infrastructure in which the resources select the tasks they execute, instead of a scheduler finding resources for the tasks. This solution allows scheduling decisions to be made with up-to-date information about the resources. GridTS provides fault-tolerant scheduling by combining a set of fault tolerance techniques to tolerate crash faults in any components o...
متن کامل